-
Notifications
You must be signed in to change notification settings - Fork 148
Parameterize all @pytest.mark.pipeline
tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats
#2597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Label error. Requires exactly 1 of: patch, minor, major. Found: |
Co-authored-by: IvoDD <[email protected]>
…format fixture Co-authored-by: IvoDD <[email protected]>
…mat fixture Co-authored-by: IvoDD <[email protected]>
…to use any_output_format fixture Co-authored-by: IvoDD <[email protected]>
Co-authored-by: IvoDD <[email protected]>
…tput_format Co-authored-by: IvoDD <[email protected]>
@pytest.mark.pipeline
tests to run both with output_format=OutputFormat.PANDAS
and output_format=OutputFormat.EXPERIMENTAL_ARROW
. The tests inside test_query_builder.py
already do that via the any_output_format
fixture. Do si...@pytest.mark.pipeline
tests to run with both PANDAS and EXPERIMENTAL_ARROW output formats
assert not object_version_store.has_symbol("sym") | ||
assert object_version_store.list_snapshots() == {snap: None} | ||
assert_frame_equal(object_version_store.read("sym", as_of=snap).data, df) | ||
assert_frame_equal_with_arrow(object_version_store.read("sym", as_of=snap).data, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change tests such as this which are not marked with pytest.mark.pipeline
pipeline.
Only this file has per test marks. Other files should be fine.
def assert_equal_value(data, expected): | ||
received = data.reindex(sorted(data.columns), axis=1) | ||
expected = expected.reindex(sorted(expected.columns), axis=1) | ||
assert_frame_equal(received, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above reindex
operations will not work on a pyarrow.Table
|
||
|
||
def test_group_on_float_column_with_nans(lmdb_version_store_v1): | ||
def aggregation_test_with_any_output_format(lib, symbol, df, grouping_column, aggs_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to modify the generic_aggregation_test as it is only used by pipeline tests.
filter_test_with_any_output_format(lib, symbol, arctic_query, expected) | ||
|
||
|
||
def filter_test_nans_with_any_output_format(lib, symbol, arctic_query, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also prefer to modify the generic_filter_test
variants rather than creating new ones here.
def test_lazy_read(lmdb_library): | ||
def test_lazy_read(lmdb_library, any_output_format): | ||
lib = lmdb_library | ||
lib.set_output_format(any_output_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the lmdb_library
fixture we should use lib._nvs.set_output_format
instead. It has a different type Library
vs for the other tests where the fixture type is NativeVersionStore
assert_frame_equal_with_arrow(expected, received, check_dtype=False) | ||
|
||
|
||
def generic_resample_test_with_arrow_support( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, prefer to modify the generic_resample_test_with_empty_buckets
from test.py
This PR parameterizes all pipeline tests to run with both
OutputFormat.PANDAS
andOutputFormat.EXPERIMENTAL_ARROW
output formats, ensuring comprehensive testing of the new experimental Arrow output format across the entire pipeline test suite.Changes Made
Core Test Infrastructure
any_output_format
parameter to all test functions that perform data comparisons in pipeline-marked test fileslib.set_output_format(any_output_format)
calls to ensure tests run with the specified output formatassert_frame_equal()
withassert_frame_equal_with_arrow()
for cross-format compatibilitynp.array_equal()
andnp.testing.assert_array_equal()
withassert_frame_equal_with_arrow()
where appropriateTest Files Updated
pytestmark = pytest.mark.pipeline
test_head.py
,test_tail.py
,test_aggregation.py
,test_projection.py
test_filtering.py
,test_row_range.py
,test_resample.py
test_lazy_dataframe.py
,test_symbol_concatenation.py
,test_ternary.py
test_query_builder_sparse.py
,test_query_builder_batch.py
test_projection_hypothesis.py
,test_filtering_hypothesis.py
test_basic_version_store.py
Helper Function Adaptations
aggregation_test_with_any_output_format()
for aggregation testsfilter_test_with_any_output_format()
and variants for filtering testsrow_range_test_with_any_output_format()
for row range testsresample_test_with_any_output_format()
andgeneric_resample_test_with_arrow_support()
for resample testsSpecial Cases Handled
TestQueryBuilderSparse
class to acceptany_output_format
@pytest.mark.skip
tests unchanged to avoid unnecessary modificationsTesting Impact
With these changes, all pipeline tests now run twice:
OutputFormat.PANDAS
(existing behavior)OutputFormat.EXPERIMENTAL_ARROW
(new experimental format)This provides comprehensive coverage of the new Arrow output format across all pipeline operations including:
Future Work
As noted in the issue, any tests that fail with internal exceptions when using
OutputFormat.EXPERIMENTAL_ARROW
should be marked withpytest.xfail
in follow-up work once the test failures are identified through CI runs.💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.